Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TypeTransformer] Support frozen dataclasses #2823

Merged
merged 5 commits into from
Oct 23, 2024

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Oct 16, 2024

Tracking issue

flyteorg/flyte#5849

Why are the changes needed?

Some users want to use frozen dataclasses in their tasks.
This can protect the value of their task.

What changes are proposed in this pull request?

We propose using object.__setattr__ wherever setattr is used in the dataclass transformer. Since object is the base class of all classes, calling setattr on it will always work. In the case of frozen dataclasses, the __setattr__ method at the dataclass level overrides the one from object, enforcing immutability.

How was this patch tested?

unit test, local execution, and remote execution.

from dataclasses import dataclass, field

from flytekit import task, ImageSpec, workflow
from flytekit.types.file import FlyteFile
flytekit_hash = "d11d0464110df4bac737554ec021ea86c0f9bcf9"
flytekit = f"git+https://github.com/flyteorg/flytekit.git@{flytekit_hash}"
image = ImageSpec(
    packages=[flytekit],
    apt_packages=["git"],
    registry="localhost:30000",
)

@dataclass(frozen=True)
class InnerInnerDC:
    x: int
    y: float

@dataclass(frozen=True)
class InnerDC:
    x: int
    y: float
    z: InnerInnerDC

@dataclass(frozen=True)
class FrozenDataclass:
    a: int = 1
    b: float = 2.0
    c: bool = True
    d: str = "hello"
    e: InnerDC = field(default_factory=lambda: InnerDC(1, 2.0, InnerInnerDC(3, 4.0)))
    ff: FlyteFile = field(default_factory=lambda: FlyteFile("s3://my-s3-bucket/example.txt"))

@task(container_image=image)
def get_frozen_dc() -> FrozenDataclass:
    return FrozenDataclass()

@task(container_image=image)
def get_frozen_attr(dc: FrozenDataclass) -> (int, float, bool, str):
    try:
        dc.a = dc.a + dc.a
    except Exception as e:
        print("Caught exception: ", e)
    return dc.a, dc.b, dc.c, dc.d

@workflow
def wf() -> (int, float, bool, str):
    dc = get_frozen_dc()
    return get_frozen_attr(dc=dc)

if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner
    import os

    runner = CliRunner()
    path = os.path.realpath(__file__)
    # input_val = '{"a": 2, "b": 3.0, "c": false, "d": "world", "e": {"x": 2, "y": 3.0, "z": {"x": 4, "y": 5.0}}}'
    result = runner.invoke(pyflyte.main,
                           ["run", path, "wf", ])
    print("Local Execution: ", result.output)
    #
    result = runner.invoke(pyflyte.main,
                           ["run", "--remote", path, "wf"])
    print("Remote Execution: ", result.output)

Setup process

Screenshots

image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

@Future-Outlier Future-Outlier changed the title [TypeTransformer] Support frozen dataclasses [WIP] [TypeTransformer] Support frozen dataclasses Oct 16, 2024
@Future-Outlier Future-Outlier changed the title [WIP] [TypeTransformer] Support frozen dataclasses [TypeTransformer] Support frozen dataclasses Oct 22, 2024
Signed-off-by: Future-Outlier <[email protected]>
@@ -663,7 +663,7 @@ def _fix_structured_dataset_type(self, python_type: Type[T], python_val: typing.
elif dataclasses.is_dataclass(python_type):
for field in dataclasses.fields(python_type):
val = python_val.__getattribute__(field.name)
python_val.__setattr__(field.name, self._fix_structured_dataset_type(field.type, val))
object.__setattr__(python_val, field.name, self._fix_structured_dataset_type(field.type, val))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned it

@wild-endeavor
Copy link
Contributor

@Future-Outlier can we just delete the _fix_structured_dataset_type function? If we need to keep it, could you add a test please that only tests just that one function? Just want to see what it does. There's a comment on that function saying that it's only needed for python 3.7 and 3.8. is that true? if that's true then it's only needed for 3.8 since flytekit doesn't support 3.7 anymore. We can do a sys check and only invoke it for python 3.8.

Also the _make_dataclass_serializable function. is that needed anymore? can we delete that too? i thought you made it so that all the flyte types were serializable. There's a test that calls that function directly... it says that it's for backwards compatibility? It's all just for that type(python_val) == str check? So we allow people to write str for files and directories? If that's all this function does could you update the comment please?

I think? there might be an issue with line 680 - return self._make_dataclass_serializable(python_val, get_args(python_type)[0]). I think there might be an issue with in get_args(python_type)[0] The issue is that UnionTransformer.is_optional_type returns true for Union[None, MyDC] so in this case it'd just get None?

@Future-Outlier
Copy link
Member Author

@Future-Outlier can we just delete the _fix_structured_dataset_type function? If we need to keep it, could you add a test please that only tests just that one function? Just want to see what it does. There's a comment on that function saying that it's only needed for python 3.7 and 3.8. is that true? if that's true then it's only needed for 3.8 since flytekit doesn't support 3.7 anymore. We can do a sys check and only invoke it for python 3.8.

Also the _make_dataclass_serializable function. is that needed anymore? can we delete that too? i thought you made it so that all the flyte types were serializable. There's a test that calls that function directly... it says that it's for backwards compatibility? It's all just for that type(python_val) == str check? So we allow people to write str for files and directories? If that's all this function does could you update the comment please?

I think? there might be an issue with line 680 - return self._make_dataclass_serializable(python_val, get_args(python_type)[0]). I think there might be an issue with in get_args(python_type)[0] The issue is that UnionTransformer.is_optional_type returns true for Union[None, MyDC] so in this case it'd just get None?

yes I will do this today, thank you

@Future-Outlier Future-Outlier merged commit f79f51d into master Oct 23, 2024
103 of 104 checks passed
@Future-Outlier
Copy link
Member Author

@Future-Outlier can we just delete the _fix_structured_dataset_type function? If we need to keep it, could you add a test please that only tests just that one function? Just want to see what it does. There's a comment on that function saying that it's only needed for python 3.7 and 3.8. is that true? if that's true then it's only needed for 3.8 since flytekit doesn't support 3.7 anymore. We can do a sys check and only invoke it for python 3.8.

Also the _make_dataclass_serializable function. is that needed anymore? can we delete that too? i thought you made it so that all the flyte types were serializable. There's a test that calls that function directly... it says that it's for backwards compatibility? It's all just for that type(python_val) == str check? So we allow people to write str for files and directories? If that's all this function does could you update the comment please?

I think? there might be an issue with line 680 - return self._make_dataclass_serializable(python_val, get_args(python_type)[0]). I think there might be an issue with in get_args(python_type)[0] The issue is that UnionTransformer.is_optional_type returns true for Union[None, MyDC] so in this case it'd just get None?

I'll mentor @Mecoli1219 and @mao3267 to do this in this month, thank you!

@Future-Outlier
Copy link
Member Author

I think? there might be an issue with line 680 - return self._make_dataclass_serializable(python_val, get_args(python_type)[0]). I think there might be an issue with in get_args(python_type)[0] The issue is that UnionTransformer.is_optional_type returns true for Union[None, MyDC] so in this case it'd just get None?

Agreed, thank you!

@dataclass
class InnerDC:
    ff: FlyteFile

@dataclass
class DC:
    inner_dc: InnerDC


@task
def t_dc() -> DC:
    return DC(inner_dc=InnerDC(ff="s3://path"))
    # -> DC(inner_dc=InnerDC(ff=FlyteFile("s3://path")))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants